HPAVC

home *** CD-ROM | disk | FTP | other *** search

/ HPAVC / HPAVC CD-ROM.iso / 3DGPL.ZIP / 3DGPL / TEXT / 3.TXT < prev next >

Wrap

Text File | 1995-06-22 | 49KB | 1,179 lines

2-D rasterization. -------------------- Re graphics: A picture is worth 10K words - but only those to describe the picture. Hardly any sets of 10K words can be adequately described with pictures. Rasterization, the term itself refers to the technology we are using to represent images: turning it into mosaic of small square pixel cells set to different colours. The algorithms performing just that are covered in this section. A dot. ------ Rasterizing dot is very straightforward, I am not even sure it is warranted to call it rasterization at all. All it involves is just setting some location in the screen memory to particular colour. Let's recall that the very first location of the colourmap describes top rightmost pixel, that determines the system of references which is convenient in this case with X axis directed to the right and Y axis directed downward. HW_X_SIZE (0,0)+-+-+-+- ... -+-------> X | | | | +-+-+ + | HW_Y_SIZE ... | +-+- +-+ | | |*|<------y +-+-+ +-+ | ^ | | V | Y x pic 3.1 Assuming that each pixel is represented by one byte, and that the dimensions of the output surface is HW_SCREEN_X_SIZE by HW_SCREEN_Y_SIZE it is the y*HW_SCREEN_X_SIZE+x location we have to access in order to plot a dot at (x,y). Here's some code to do just that: unsigned char *G_buffer; /* colourmap- offscreen buffer */ void G_dot(int x,int y,unsigned char colour) { G_buffer[y*HW_SCREEN_X_SIZE+x]=colour; } A line. ------- Line rasterization is a very important peace of code for the library. It would also be used for polygon rendering, and few other routines. What we have to do is set to some specified colour all dots on the line's path. We can easily calculate coordinates of all dots belonging to this line: * (xend,yend) --- / | * (x,y) dy / | / |y | (xstart,ystart) *-----+ --------- x | | | |- -dx- -| pic 3.2 Remembering a little bit of high (high?) school geometry, it can be seen that if: dx = xend-xstart dy = yend-ystart then: x dx x * dy --- = ---- and y = -------- y dy dx Since we want to set all pixels along the path we just have to take all Xs in the range of [xstart, xend] and calculate respective Y. There is a bit of a catch, we would have only as much as xend-xstart calculated coordinates. But if the Y range was actually longer the path we have just found won't be continuous pic 3.3. In this case we should have been calculating coordinates taking values from the longer range [ystart, yend] and getting respective X, pic 3.4 as: y * dx x = -------- dy ....* ....* ..... ....* ...*. ...*. ..... ...*. ..*.. yrange ..*.. yrange ..... ..*.. .*... .*... ..... .*... *.... *.... xrange xrange pic 3.3 pic 3.4 What we just did is quite logical and easily programmable algorithm, but... it is not that commonly used, the reason has to do with the way we did the calculation, it involved division, and that is by far the slowest operation any computer does. What we would do instead doesn't involve a single division. It is called a Bresenham's integer based algorithm (Bresenham, 1965) and here we find all points using few logical operations and integer addition. The idea is to find a bigger range among [xstart,xend] and [ystart,yend] go along points in it and have a variable saying when it is time to advance in the other range. Let's consider what is happening at some stage in line rasterization, let's assume X is a longer range (dx>dy) and that with the line we are rendering xstart<xend and ystart<yend: | | | H(x+1,y+1) -+---|---*- | |h/ | | | * I(x+1,real_y) | /| + - - - - + - - - - + - | / | | | / | |l | / | -*---|---*-L(x+1,y) |P(x,y) | pic 3.5 Pic 3.5 shows the situation where we just rendered a pixel P (Previous) at coordinates (x,y) and now have to make a decision where to go to next, to pixel L(x+1,y) (Lower) or H(x+1,y+1) (Higher). points P,H,L actually represent middles of respective pixels. Since actual line would pass somewhere in the middle between those two points we must plot the one whose centre is closer to the intersection point I(x+1,real_y). this can be measured by comparing segments h and l, resulted from the intersection: remembering dependency used in the previous method we can find that: dy * (x+1) real_y = ------------ dx and: dy * (x+1) h = y+1-real_y => h = y+1 - ------------ dx dy * (x+1) l = real_y-y => l = ------------ - y dx Now, we are interested in comparing l and h: dy * (x+1) l - h = 2 ------------ - 2y-1 dx So, if l-h>0 it means that l>h the intersection point L is closer to point H and the later should be rendered, otherwise L should be rendered. let's multiply both sides by dx: dx(l - h)= 2dyx + 2dy - 2dxy - dx Now, since dx assumed bigger then 0, signs of (l-h) and dx(l-h) would be the same. Let's call dx(l-h) as d and find it's sign and value at some step i and next step i+1: d = 2dyx -2dxy + 2dy - dx i i-1 i-1 d = 2dyx -2dxy + 2dy - dx i+1 i i We can also find the initial value d, in the very first point since before that x==0 and y==0: d = 2dy-dx 1 ------------- Since sign of d determines which point to move next (H or L) lets find value of d at some step i assuming we know it's value at i-1 from previous calculations: d - d = 2dyx - 2dxy - 2dyx + 2dxy i+1 i i i i-1 i-1 d - d = 2dy(x - x ) - 2dx(y - y ) i+1 i i i-1 i i-1 Depending on the decision taken in the previous point value of d in the next one would be either: when H was chosen since x - x =1 and y - y =1 i i-1 i i-1 d - d = 2dy - 2dx i+1 i d = d + 2dy - 2dx i+1 i --------------------- or: when L was chosen since x - x =1 and y - y =0 i i-1 i i-1 d -d = 2dy i+1 i d =d + 2dy i+1 i -------------- This may seem a little complicated but the implementation code certainly doesn't look this way. We just have to find the initial value of d, and then in the loop depending on it's sign add to it either 2dy-2dx, or just 2dy. since those are constants, they can be calculated before the actual loop. In the proof above we have assumed xstart<yend ystrat<yend, however, in real life we can't guarantee that. The way we can take care of it, is just changing increments to decrements when above assumptions don't hold. We also have to remember that in the loop when L point is picked it is only value belonging to the bigger range which is incremented (decremented) whereas when picking H point both X and Y have to be changed since that is when we are advancing along both axes. void G_line(int x,int y,int x2,int y2,unsigned char colour) { int dx,dy,long_d,short_d; int d,add_dh,add_dl; register int inc_xh,inc_yh,inc_xl,inc_yl; register int i; dx=x2-x; dy=y2-y; /* ranges */ if(dx<0){dx=-dx; inc_xh=-1; inc_xl=-1;} /* making sure dx and dy >0 */ else { inc_xh=1; inc_xl=1; } /* adjusting increments */ if(dy<0){dy=-dy; inc_yh=-1; inc_yl=-1;} else { inc_yh=1; inc_yl=1; } if(dx>dy){long_d=dx; short_d=dy; inc_yl=0;}/* long range,&making sure either */ else {long_d=dy; short_d=dx; inc_xl=0;}/* x or y is changed in L case */ d=2*short_d-long_d; /* initial value of d */ add_dl=2*short_d; /* d adjustment for H case */ add_dh=2*short_d-2*long_d; /* d adjustment for L case */ for(i=0;i<=long_d;i++) /* for all points in longer range */ { G_dot(x,y,colour); /* rendering */ if(d>=0){x+=inc_xh; y+=inc_yh; d+=add_dh;}/* previous point was H type */ else {x+=inc_xl; y+=inc_yl; d+=add_dl;}/* previous point was L type */ } } As you can see this algorithm doesn't involve divisions at all. It does have multiplication by two, but almost any modern compiler would optimize it to 1 bit left shift - a very cheap operation. Or if we don't have faith in the compiler we can do it ourselves. Lets optimize this code a little bit. Whenever trying to optimize something we first have to go after performance of the code in the loop since it is something being done over and over. In this source above we can see the function call to G_dot, let's remember what is inside there,.... oh, it is y*HW_SCREEN_X_SIZE+x a multiplication... an operation similar in length to division which we just spent so much effort to avoid. Lets think back to the representation of the colourmap, if we just rendered a point and now want to render another one next to it how their addresses in the colourmap would be different? pic 3.6 +-+-+- A|*->|B +|+-+- C|V| | +-+- | | pic 3.6 Well, if it is along X axis that we want to advance we just have to add 1, to the address of pixel A to get to pixel B, if we want to advance along Y we have to add to the address of A number of bytes in the colourmap separating A and C. this number is exactly HW_SCREEN_X_SIZE that is length of one horizontal line of pixels in memory. Now, using that, let's try instead of coordinates (x,y) to calculate it's address in the colourmap: void G_line(int x,int y,int x2,int y2,unsigned char colour) { int dx,dy,long_d,short_d; int d,add_dh,add_dl; int inc_xh,inc_yh,inc_xl,inc_yl; register int inc_ah,inc_al; register int i; register unsigned char *adr=G_buffer; dx=x2-x; dy=y2-y; /* ranges */ if(dx<0){dx=-dx; inc_xh=-1; inc_xl=-1;} /* making sure dx and dy >0 */ else { inc_xh=1; inc_xl=1; } /* adjusting increments */ if(dy<0){dy=-dy; inc_yh=-HW_SCREEN_X_SIZE; inc_yl=-HW_SCREEN_X_SIZE;}/* to get to the neighboring */ else { inc_yh= HW_SCREEN_X_SIZE; /* point along Y have to add */ inc_yl= HW_SCREEN_X_SIZE;}/* or subtract this */ if(dx>dy){long_d=dx; short_d=dy; inc_yl=0;}/* long range,&making sure either */ else {long_d=dy; short_d=dx; inc_xl=0;}/* x or y is changed in L case */ inc_ah=inc_xh+inc_yh; inc_al=inc_xl+inc_yl; /* increments for point adress */ adr+=y*HW_SCREEN_X_SIZE+x; /* address of first point */ d=2*short_d-long_d; /* initial value of d */ add_dl=2*short_d; /* d adjustment for H case */ add_dh=2*short_d-2*long_d; /* d adjustment for L case */ for(i=0;i<=long_d;i++) /* for all points in longer range */ { *adr=colour; /* rendering */ if(d>=0){adr+=inc_ah; d+=add_dh;} /* previous point was H type */ else {adr+=inc_al; d+=add_dl;} /* previous point was L type */ } } We can see from this code that although the complexity of the preliminary part of the algorithm went up a bit, the code inside the loop is much shorter and simpler. There is just one thing left to add in order to make it all completely usable - clipping that is making sure our drawing doesn't go outside the physical boundaries of the colourmap, but this is covered in the next chapter. Ambient polygons ---------------- What we have to do is to fill with some colour inside the polygon's edges. The easiest and perhaps fastest way to do it is by using "scan line" algorithm. The idea behind it is to convert a polygon into a collection of usually horizontal lines and then render them one at a time. The lines have to be horizontal only because the most common colourmap would have horizontal lines contingent in memory which automatically allows to render them faster then vertical lines since increment by 1 is usually faster then addition. There is one inherited limitation to the algorithm, because at each Y heights there would be just one horizontal line only concave polygons can be rendered correctly. It can be seen that non concave polygons might need two or more lines sometimes pic 3.7 (that's by the way, in case you wondered, determines the difference in definitions between concave and non-concave polygons): *\ | \ /* y|---\ /-| | \/ | *--------* pic 3.7 How can we represent the collection of horizontal lines? obviously we just need to know at which Y heights it is, X coordinate of the point where it starts and another X coordinate of the point where it ends. So lets have one "start" and one "end" location for every Y possible on the screen. Since every line is limited by two points each belonging to certain edge, we can take one edge at a time find all points belonging to it and for each of them change the "start" and "end" at particular Y heights. So that at the end we would have coordinates of all scan lines in the polygon pic 3.8: start end +-----+---+ 1 2 3 4 5 6 7 8 | 1 | 5 |- - - - - -> *------\ | 2 | 8 | \------------* | 2 | 7 |- - - - - - -> \----------/ | 3 | 7 | \------/ | 4 | 6 |- - - - - - - -> *----* +-----+---+ pic 3.8 Initially we would put the biggest possible value in all locations of "start" array and the smallest possible in the locations of "end" array. (Those are by the way are defined in <limits.h> and it is not a bad practice to use them). Each time we've calculated the point on the edge we have to go to "start" and "end" locations at Y heights and if X of the point less then what we have in "start" write it there. Similar to that if this same X is bigger then value in "end" location we have to put it there too. Because of the way the arrays were initialized first point calculated for every Y height would change both "start" and "end" location. Let's look at the code to find Xs for each edge, so called scanning function: void GI_scan(int *edges,int dimension,int skip) { signed_32_bit cur_v[C_MAX_DIMENSIONS]; /* initial values for Z+ dims */ signed_32_bit inc_v[C_MAX_DIMENSIONS]; /* increment for Z+ dimensions */ signed_32_bit tmp; int dx,dy,long_d,short_d; register int d,add_dh,add_dl; register int inc_xh,inc_yh,inc_xl,inc_yl; int x,y,i,j; int *v1,*v2; /* first and second verteces */ v1=edges; edges+=skip; v2=edges; /* length ints in each */ if(C_line_y_clipping(&v1,&v2,dimension)) /* vertical clipping */ { dx=*v2++; dy=*v2++; /* extracting 2-D coordinates */ x=*v1++; y=*v1++; /* v2/v1 point remaining dim-2 */ dimension-=2; if(y<G_miny) G_miny=y; if(y>G_maxy) G_maxy=y; if(dy<G_miny) G_miny=dy; if(dy>G_maxy) G_maxy=dy; /* updating vertical size */ dx-=x; dy-=y; /* ranges */ if(dx<0){dx=-dx; inc_xh=-1; inc_xl=-1;} /* making sure dx and dy >0 */ else { inc_xh=1; inc_xl=1; } /* adjusting increments */ if(dy<0){dy=-dy; inc_yh=-1; inc_yl=-1;} else { inc_yh=1; inc_yl=1; } if(dx>dy){long_d=dx;short_d=dy;inc_yl=0;} /* long range,&making sure either */ else {long_d=dy;short_d=dx;inc_xl=0;} /* x or y is changed in L case */ d=2*short_d-long_d; /* initial value of d */ add_dl=2*short_d; /* d adjustment for H case */ add_dh=2*short_d-2*long_d; /* d adjustment for L case */ for(i=0;i<dimension;i++) /* for all remaining dimensions */ { tmp=v1[i]; tmp<<=G_P; cur_v[i]=tmp; /* f.p. start value */ tmp=v2[i]-v1[i]; tmp<<=G_P; /* f.p. increment */ if(long_d>0) inc_v[i]=tmp/long_d; /* if needed (non 0 lines) */ } for(i=0;i<=long_d;i++) /* for all points in longer range */ { if(x<G_start[y]) /* further then rightmost */ { G_start[y]=x; /* the begining of scan line */ for(j=0;j<dimension;j++) G_rest_start[j][y]=cur_v[j]; /* all other dimensions */ } if(G_end[y]<x) /* further the leftmost */ { G_end[y]=x; /* the end of scan line */ for(j=0;j<dimension;j++) G_rest_end[j][y]=cur_v[j]; /* and for other dimension */ } if(d>=0){x+=inc_xh;y+=inc_yh;d+=add_dh;} /* previous dot was H type */ else {x+=inc_xl;y+=inc_yl;d+=add_dl;} /* previous dot was L type */ for(j=0;j<dimension;j++) cur_v[j]+=inc_v[j]; /* for all other dimensions */ } } } As you can see this edge scanning function doesn't have much differences with our most basic line rendering code. The only thing we are updating the vertical size of polygon so that after all edges would go through this function we would have the correct vertical dimensions in G_miny and G_maxy. There is another difference: scan function is designed to process, other dimensions, that is it would interpolate X and similar to that other values specified in vertices, finding them for all Ys in range. We would need that when adding interpolative shading feature. As to the ambient polygon rendering, After scanning is over we would just have to render all horizontal scan lines in the range [G_miny,G_maxy]: void G_ambient_polygon(int *edges,int length,unsigned char colour) { int new_edges[G_MAX_TUPLES]; /* verticaly clipped polygon */ int new_length; /* although no edges there yet */ register unsigned char *l_adr,*adr; register int beg,end,i; GI_boarder_array_init(); /* initializing the arrays */ new_length=C_polygon_x_clipping(edges,new_edges,2,length); for(i=0;i<new_length;i++) GI_scan(&new_edges[i*2],2,2); /* Searching polygon boarders */ if(G_miny<=G_maxy) /* nothing to do? */ { l_adr=G_buffer+G_miny*HW_SCREEN_X_SIZE; /* addr of 1st byte of 1st line */ for(; G_miny<=G_maxy; G_miny++, l_adr+=HW_SCREEN_X_SIZE) /* rendering all lines */ { adr=l_adr+(beg=G_start[G_miny]); /* addr of the current line */ end=G_end[G_miny]; /* ends here */ for(;beg<=end;beg++) *adr++=colour; /* rendering single line */ } } } As to the optimization it doesn't appear we can do a lot without leaving comfortable boundaries of C. On the other hand when there's really nothing else we can do to speed things up, when all manuals are read and all friends are out of ideas let's do it, let's go for assembly. I don't think rewriting all functions this way is appealing. Moreover only rewriting the real execution bottleneck would give considerable speed increase. That's, perhaps, one of the reason why modern compilers have such a nice feature as inline assembly, allowing to add low-level code directly into C source. Looking at the code above we can see this line rendering loop. This is an obvious candidate, since it executes fairly often and does quite a lot of iterations, especially compared with surrounding code. Different C compilers have different provision for inline assembly. GNU C compiler has quite powerful syntax allowing to specify assembly code. It is: asm("assembly_command %1,%0":"=constraint" (result) :" constraint" (argument), " constraint" (argument), ... :" clobbered_reg", "clobbered_reg" ... ); The assembly instruction is specified with aliases allowing to add names of C variables into it. it also allows to specify the CPU registers the command, we are inserting, might clobber. It is particularly important if we are expecting to compile with optimizing option or have register variables in the code. We don't want the program trying to use contents of a register we just destroyed by inline assembly instruction. If we are specifying the clobbered registers, on the other hand, compiler would make sure that no useful information is in this registers at the moment inline assembly code starts to execute. WATCOM C compiler has different provision for inline assembly: return_type c_function(parameter1,parametr2, ...); #pragma aux c_fucntion= \ "assembly_command" \ "assembly_command" \ parm [reg] [reg] \ value [reg] \ modify [reg reg]; It is implemented by providing sort of a pseudo c-function which is treated by the compiler pretty much like a macro. "parm" specifies which registers take in parameters from function's argument list, "value"- from what register, if any, the return value is taken and "modify", specifies clobbered registers. for DJGPP compiler code substituting line rendering loop: for(;beg<=end;beg++) *adr++=colour; /* rendering single line */ may look like: asm("movl %0,%%ecx" :: "g" (end):"%ecx"); asm("subl %0,%%ecx" :: "g" (beg):"%ecx"); asm("movl %0,%%edi" :: "g" (adr):"%edi"); asm("movl %0,%%eax" :: "g" (colour):"%eax"); asm("cld"); /* increment when stosb */ asm("rep"); asm("stosb %al,(%edi)"); /* rendering line here */ for WATCOM: void a_loop(unsigned char *adr,int end,int beg,int colour); #pragma aux a_loop= \ "sub ecx,esi" \ "cld" \ "rep stosb" \ parm [edi] [ecx] [esi] [al]; Not a terribly complicated code to write but something giving considerable speed increase. Then again we might have used a C library function memset for filling chunks of memory: void *memset(void *s,int c,size_t n); memset(l_adr,colour,end-beg); /* filling scan line */ This function might have had just that code, we wrote, inside of it. let's not believe assumptions neither pro nor con and go check that out. with DJGPP or, in this matter, most UNIX compilers we can just extract and disassemble memset code from libc.a (standard c library) and see for ourselves. let's go: ar -x libc.a memset.o objdump -d memset.o We would see something like: memset.o: file format coff-go32 Disassembly of section .text: 00000000 <_memset> pushl %ebp 00000001 <_memset+1> movl %esp,%ebp 00000003 <_memset+3> pushl %edi 00000004 <_memset+4> movl 0x8(%ebp),%edi 00000007 <_memset+7> movl 0xc(%ebp),%eax 0000000a <_memset+a> movl 0x10(%ebp),%ecx 0000000d <_memset+d> jcxz 00000011 <_memset+11> 0000000f <_memset+f> repz stosb %al,%es:(%edi) 00000011 <_memset+11> popl %edi 00000012 <_memset+12> movl 0x8(%ebp),%eax 00000015 <_memset+15> leave 00000016 <_memset+16> ret 00000017 <_memset+17> Address 0x18 is out of bounds. Disassembly of section .data: Very similar to what we wrote? Other compilers might have worse library code? it might be so, let's try WATCOM C, let's go: wlib clibs.lib *memset wdisasm memset.obj that's what we would see: Module: memset Group: 'DGROUP' CONST,CONST2,_DATA,_BSS Segment: '_TEXT' BYTE USE32 00000023 bytes 0000 57 memset push edi 0001 55 push ebp 0002 89 e5 mov ebp,esp 0004 8b 45 10 mov eax,+10H[ebp] 0007 8b 4d 14 mov ecx,+14H[ebp] 000a 8b 7d 0c mov edi,+0cH[ebp] 000d 06 push es 000e 1e push ds 000f 07 pop es 0010 57 push edi 0011 88 c4 mov ah,al 0013 d1 e9 shr ecx,1 0015 f2 66 ab repne stosw 0018 11 c9 adc ecx,ecx 001a f2 aa repne stosb 001c 5f pop edi 001d 07 pop es 001e 89 f8 mov eax,edi 0020 c9 leave 0021 5f pop edi 0022 c3 ret No disassembly errors ------------------------------------------------------------ Is it not very similar to what we just did using inline assembly? WATCOM C code is even trying to be smarter then us by storing as much as it can by word store instruction, since every word store in this case is equivalent to two char stores, there should be twice less iterations. Of course there would be some time spent on the function call itself but most likely performance of the inline assembly code we wrote and memset call would be very very similar. It is just yet another example of how easy it might be in some cases to achieve results by very simple means instead of spending sleepless hours in front of the glowing box (unfortunately only sleepless hours teach us how to do things by simple means but that's the topic for another story only marginally dedicated to 3-D graphics). By the way, in the provided source code there is an independent approach to fast memory moves and fills. That is functions to perform those are brought out into hardware interface as: HW_set_char ; HW_set_int; HW_copy_char; HW_copy_int. And their implementation may actually be either of the ways described above depending on the particular platform. Shaded polygons. ---------------- Shaded polygons are used to smooth appearance of faceted models, that is those where we approximate real, curved surfaces by a set of plane polygons. Interpolative or Gouraud shading (Gouraud, 1971), we would discuss, is the easiest to implement, it is also not very expensive in terms of speed. The idea is that we carry a colour intensity value in every vertex of the polygon and linearly interpolate those values to find colour for each pixel inside the polygon. Finally this is the place where implemented N-dimensional scanning procedure comes in handy. Indeed colour intensity can be pretty much handled as just an extra dimension as far as edge scanning and clipping are concerned. *I3 / \ / \ I0* \ \ *I2 \ I / IT1*---*---*IT2 \ / \ / \ / *I1 pic 3.9 We can obtain values on the left and right boarders of a polygon (IT1, IT2 on pic 3.9) by interpolating colour intensity value in every edge during edge scanning, just as "start/end' X values were found for every horizontal line. Afterwards when rendering certain horizontal scan line we can further interpolate "start" and "end" intensities for this line finding colour for pixels belonging to it (I on pic 3.9). Fixed point math is probably a very convenient way to implement this interpolation: void G_shaded_polygon(int *edges,int length) { int new_edges[G_MAX_SHADED_TUPLES]; /* verticaly clipped polygon */ int new_length; register unsigned char *l_adr,*adr; register int beg,end,i; register signed_32_bit cur_c,inc_c; /* current colour and it's inc */ GI_boarder_array_init(); /* initializing the array */ new_length=C_polygon_x_clipping(edges,new_edges,3,length); for(i=0;i<new_length;i++) GI_scan(&new_edges[i*3],3,3); /* Searching polygon boarders */ if(G_miny<=G_maxy) /* nothing to do? */ { l_adr=G_buffer+G_miny*HW_SCREEN_X_SIZE; /* addr of 1st byte of 1st line */ for(; G_miny<=G_maxy; G_miny++, l_adr+=HW_SCREEN_X_SIZE) /* rendering all lines */ { adr=l_adr+(beg=G_start[G_miny]); /* addr of the current line */ end=G_end[G_miny]; /* ends here */ cur_c=G_0_start[G_miny]; /* colour starts with this value */ inc_c=G_0_end[G_miny]-cur_c; if(end>beg) inc_c/=end-beg+1; /* f.p. increment */ for(;beg<=end;beg++) /* render this line */ { *adr++=cur_c>>G_P; /* rendering single point */ cur_c+=inc_c; /* incrementing colour */ } } } } As you see code above looks not that much different from ambient polygon rendering source. There is one small catch. I said that intensities can be handled pretty much as an extra dimension. In fact we can consider shaded polygon on the screen to take 2 space dimensions, and to have one colour dimension. But would all vertexes of this polygon belong to one plane in such 3-D space? If they would not, shading would look rather weird especially when this polygon would start to rotate. The common solution is limit polygons to just triangles. Why? well three points always belong to the same plane in 3-D space. Textured polygons. ------------------ It might seem that we can extend interpolation technique used to calculate colour intensity for shading, onto texture rendering. Just keep texture X and Y in every vertex of the polygon, interpolate those two along edges and then along horizontal lines obtaining this way texture (X,Y) for every pixel to be rendered. This however does not work... Or to be more exact it stops working when perspective projection is used, the reason is simple: perspective transformation is not linear. One may ask, why did it work for interpolative shading, after all we were using perspective transformation all along? The answer is it did not work, but visual aspect of neglecting perspective effect for shading is quite suitable, neglecting it for texture rendering however is not. I believe it has to do with different image frequencies. Just consider the following situation: a rectangular polygon is displayed on the screen so that it's one side is much closer to us then the other one, which is almost disappearing in the infinity: where do you think in the texture map lines A,B,C and D would be mapped from by linear interpolating method? how do you think the texture on the polygon would look like? +\ +--/---/+1 A|--\ A|/ / | B|----\ B|/ | C|----/ C|\ <------ Missed area D|--/ D|\ \ | +/ +--\---\+2 Polygon on the Texture Screen Map pic 3.10 Well it would look like if someone has carved triangular area marked "missed" on the picture above, and connected points 1 and 2 together, real ugly... In less dramatic cases texture rendered this way might look nicer, being perfect when effect of perspective transformation is negligible: all points are at almost the same distance from the viewer, or very far away from the viewing plane. To get always nice looking results we would have to consider what is really happening when texture is being rendered: u^ Texture Space | *T(u,v) +----> o v y ^ | z | U \ /*W(x,y,z)/ V World Space | / \ / | / \ / |/ * O +--------------> x j^ | | *S(i,j) Screen Space | +------->i pic 3.11 Consider three spaces pictured above: the texture space, world space (the one before projection) and finally contents of the screen - screen or image space. We may think of them just as original polygon/texture map in case of texture space, polygon/texture map turned somehow in case of world space and finally same projected onto the screen. If we know mapping of origin and two orthogonal vectors from texture space into the world space, we can express mapping of any point through them: x = Ox + v*Vx + u*Ux y = Oy + v*Vy + u*Uy z = Oz + v*Vz + u*Uz Where Vx...Uz are corresponding components of the respective vectors. Further point in the world space W(x,y,z) is being perspectively transformed into the screen space S(i,j): i = x / z j = y / z (the actual transformation used, would most likely also contain a multiplier - focus, but let's worry about particularities on some later stage). In order to perform mapping, on the other hand, we need u,v texture coordinates as function of screen coordinates i,j x = i * z y = i * z or: Ox + v*Vx + u*Ux = i * [ Oz + v*Vz + u*Uz ] Oy + v*Vy + u*Uy = j * [ Oz + v*Vz + u*Uz ] trying to express u,v through i,j: v*(Vx-i*Vz) + u*(Ux-i*Uz) = i*Oz - Ox v*(Vy-j*Vz) + u*(Uy-j*Uz) = j*Oz - Oy further: (i*Oz - Ox) - u*(Ux - i*Uz) v = ----------------------------- (Vx - i*Vz) (i*Oz - Ox) - v*(Vx - i*Vz) u = ----------------------------- (Ux - i*Uz) and: (i*Oz - Ox) - u*(Ux - i*Uz) ----------------------------- * (Vy - j*Vz) + u*(Uy - j*Uz) = j*Oz - Oy (Vx - i*Vz) (i*Oz - Ox) - v*(Vx - i*Vz) v*(Vy - j*Vz) + ----------------------------- * (Uy - j*Uz) = j*Oz - Oy (Ux - i*Uz) or in nicer form: i*(Vz*Oy-Vy*Oz) + j*(Vx*Oz-Vz*Ox) + (Vy*Ox-Vx*Oy) u = -------------------------------------------------- i*(Vy*Uz-Vz*Uy) + j*(Vz*Ux-Vx*Uz) + (Vx*Uy-Vy*Ux) i*(Uy*Oz-Uz*Oy) + j*(Uz*Ox-Ux*Oz) + (Ux*Oy-Uy*Ox) v = -------------------------------------------------- i*(Vy*Uz-Vz*Uy) + j*(Vz*Ux-Vx*Uz) + (Vx*Uy-Vy*Ux) At this point we have formulas of from where in the texture space the point in the screen space originates. There are several implementational problems, first few are simple, the actual perspective transformation is i=x*focus/z rather then i=x/z, plus, we are performing screen centre translation so that screen space (0,0) is in the middle of the screen and not in the upper left corner. Hence original dependency should have been: i = (x * focus) / z + HW_SCREEN_X_CENTRE j = (y * focus) / z + HW_SCREEN_Y_CENTRE And so finally, so called, reverse mapping formulas have to be amended respectively: ((i-HW_SCREEN_X_CENTRE)/focus) * (Vz* ... u = ------------------------------------------- ... ... Another, a bit difficult part has to do with texture scaling. If the size of texture vectors was picked to be 1 that would assume no scaling, that is unit size along the polygon in the world space would correspond to unit size in the texture space. What we however want to do in a lot of instances is to map smaller texture onto a bigger polygon or the other way around, bigger texture onto a smaller polygon. Let's try scaling the texture space: T *----------> v| * | S | | \ *---------------------> | | \ | v'| * | v T | *-- ---* | | \ ----- = ----- V- - - - - | | | \ | v' S | | \ | |mapping \ | v'*T | |of the \ v = ----- | |polygon \ | S | *---------------* V- - - - - - - - - - -| pic 3.12 now we can put expression for v into direct mapping equations: Vx*T Ux*T x = Ox + v'* ------ + u'* ------ S S ... Vx and Ux multiplied by T in fact are just projections of vectors enclosing the hole texture space, let's call them V'x == Vx*T, U'x == Ux*T etc. V'x U'x x = Ox + v'* ----- + u'* ----- S S ... considering this reverse mapping equations would look like: ((i-HW_SCREEN_X_CENTRE)/focus) * (V'z* ... u'= S * -------------------------------------------- ... ... This would allow us to scale texture on the polygon by changing S multiplier. Another problem has to do with point O - mapping of the origin of the texture space into the world space. The thing is, if the mapping of this point gets onto Z==0 plane mapping equations above no longer hold. Just due to the fact that perspective transformation is having singularity there. In general we deal with this problem of perspective transformation by clipping the polygon against some plane in front of Z==0. And do we really need mapping of exactly O of the texture space? Not necessarily, it may be any point in texture space assuming we change a bit direct mapping equations: x' = Ox + (v'-v'0)*V'x + (u-u0)*U'x ... where (v'0,u'0) are coordinates in the texture space of the point we have a mapping for in the world space that is (Ox,Oy,Oz). And the reverse mapping equations would be then: ((i-HW_SCREEN_X_CENTRE)/focus) * (V'z* ... u'= S * ------------------------------------------- + u'0 ... ... How can we obtain a mapping of, now, any point from the texture space into the world space? If we would associate with every polygon's vertex besides just x,y,z also u,v of where this vertex is in the texture, we can treat that as 5-D polygon and perform clipping on it, obtaining vertexes with coordinates suitable for perspective transformation, hence any of them would be fine for the mapping equations. (How to do clipping on polygons is covered in the next chapter). Last thing, in order to render regular patterns, those repeating themselves, we may want to introduce texture size parameter, and make sure that if we want to access texture outside this size, this access would wrap around to pick a colour from somewhere inside the texture. This can be easily done if texture size is chosen to be some power of 2, for in this case we can perform range checking with just logical "and" operation. However, using above equations for texture mapping appear to be quite expensive, there are few divisions involved per each rendered pixel. How to optimize that? The, perhaps, easiest way is: horizontal line subdivision, that is calculating real texture mapping every N pixels and linearly interpolating in between, this method is used in the implementation below: void G_textured_polygon(int *edges,int length,int *O,int *u,int *v, unsigned char *texture,int log_texture_size, int log_texture_scale ) { int new_edges[G_MAX_SHADED_TUPLES]; /* verticaly clipped polygon */ int new_length; signed_32_bit Vx,Vy,Vz; signed_32_bit Ux,Uy,Uz; /* extracting vectors */ signed_32_bit x0,y0,z0; signed_32_bit ui,uj,uc; signed_32_bit vi,vj,vc; signed_32_bit zi,zj,zc; /* back to texture coeficients */ signed_32_bit v0,u0; signed_32_bit xx,yy,zz,zzz; int xstart,xend; signed_32_bit txstart,tystart; signed_32_bit txend,tyend; signed_32_bit txinc,tyinc; register unsigned char *g_adr,*adr; register int i,x,y; signed_32_bit txtrmasc=(0x1<<(log_texture_size+G_P))-0x1; log_texture_scale+=G_P; GI_boarder_array_init(); /* initializing the array */ new_length=C_polygon_x_clipping(edges,new_edges,4,length); for(i=0;i<new_length;i++) GI_scan(&new_edges[i*4],2,4); /* Searching polygon boarders */ if(G_miny<=G_maxy) /* nothing to do? */ { x0=O[0]; y0=O[1]; z0=O[2]; u0=O[3]<<G_P; v0=O[4]<<G_P; /* world point <-> texture point */ Vx=v[0]; Vy=v[1]; Vz=v[2]; Ux=u[0]; Uy=u[1]; Uz=u[2]; /* extracting vectors */ ui=(Vz*y0)-(Vy*z0); uj=(Vx*z0)-(Vz*x0); uc=(Vy*x0)-(Vx*y0); vi=(Uy*z0)-(Uz*y0); vj=(Uz*x0)-(Ux*z0); vc=(Ux*y0)-(Uy*x0); zi=(Vy*Uz)-(Vz*Uy); zj=(Vz*Ux)-(Vx*Uz); zc=(Vx*Uy)-(Vy*Ux); /* back to texture */ g_adr=G_buffer+G_miny*HW_SCREEN_X_SIZE; /* addr of 1st byte of 1st line */ for(; G_miny<=G_maxy; G_miny++, g_adr+=HW_SCREEN_X_SIZE) /* rendering all lines */ { xstart=G_start[G_miny]; adr=g_adr+xstart; xstart-=HW_SCREEN_X_CENTRE; x=xstart; xend=G_end[G_miny]-HW_SCREEN_X_CENTRE; y=G_miny-HW_SCREEN_Y_CENTRE; xx=((y*uj)>>T_LOG_FOCUS)+uc; yy=((y*vj)>>T_LOG_FOCUS)+vc; zz=((y*zj)>>T_LOG_FOCUS)+zc; /* valid for the hole run */ if(( zzz=zz+((x*zi)>>T_LOG_FOCUS) )!=0) { txend=( (xx+ ((x*ui)>>T_LOG_FOCUS)) <<log_texture_scale )/zzz +u0; tyend=( (yy+ ((x*vi)>>T_LOG_FOCUS)) <<log_texture_scale )/zzz +v0; } for(;xstart<xend;) { x+=G_LINEAR; if(x>xend) x=xend; /* size of linear run */ txstart=txend; tystart=tyend; if(( zzz=zz+((x*zi)>>T_LOG_FOCUS) )!=0) { txend=( (xx+ ((x*ui)>>T_LOG_FOCUS)) <<log_texture_scale )/zzz +u0; tyend=( (yy+ ((x*vi)>>T_LOG_FOCUS)) <<log_texture_scale )/zzz +v0; } length=x-xstart; /* ends here */ if(length!=0) { txinc=(txend-txstart)/length; tyinc=(tyend-tystart)/length; } for(;xstart<x;xstart++) /* linear run */ { txstart&=txtrmasc; tystart&=txtrmasc; /* wrap around */ *adr++=*(texture+((tystart>>G_P)<<log_texture_size)+(txstart>>G_P)); txstart+=txinc; tystart+=tyinc; /* new position in the texture */ } } } } } Other possible speed-up techniques are: area subdivision involving 2-D interpolation, faking texture mapping with polynomials (later has not very much to do with the true mapping equations described here, however) and rendering texture along constant z lines (and not along horizontal line) the advantage gained by the former is that along every constant Z line perspective transformation is linear ( perspective transformation is i=x/z if z==const we can write it as i=coef*x where coef=1/z which is linear of course) The function above might be extended with interpolative shading as well. To do that special consideration has to be given to palette's layout, that is where and how to keep different intensities of the same colour. But once decided coding is quite straightforward. * * *